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Abstract 


We propose a new estimator, the thresholded scaled Lasso, in high dimensional threshold 
regressions. First, we establis h an upper bound on the £oo estimation error of the scaled Lasso 


estimator of Lee et al. ( 2012fl . This is a non-trivial task as the literature on high-dimensional 
models has focused almost exclusively on ii and £2 estimation errors. We show that this sup- 
norm bound can be used to distinguish between zero and non-zero coefficients at a much finer 
scale than would have been possible using classical oracle inequalities. Thus, our sup-norm 
bound is tailored to consistent variable selection via thresholding. 

Our simulations show that thresholding the scaled Lasso yields substantial improvements in 
terms of variable selection. Finally, we use our estimator to shed further empirical light on the 
long running debate on the relationship between the level of debt (public and private) and GDP 
growth. 
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1 Introduction 


Threshold models have been heavily st udied and use d in th e past twenty years or so. In econometrics 
the seminal articles by Hansen ( 1996l l and Hansen ( 2000l l showed that least squares estimation of 
threshold models is possible and feasible. These papers show how to test fo r the presence of a 
thresh old and how to estimate the remaining parameters by least squares. Later. ICaner and Hansen 
( 2004 ! ) provided instrumental variable estimation of the threshold. These authors derived the limits 
for the threshold parameter in the reduced form as well as structural equations. 

There have been many applications of threshold models in cross-section data. One of the most 
recent ones i s the analysis of the public debt to GDP ratio in a threshold r egression model by 


Caner et al 


(l2nnih . ISenI 



In t he co ntext of time series we refer to the article s by ICaner and Hansen 
Seol ( 20081 ) . and Hansen and Seo ( 2002 ). Lin ( 2014 ) considers the adaptive 


Lasso in a high dimension al quant i le thr e shold model. In panel data, semi-pa rametrics, and least 
absolute deviation models, Hansen ( lOOOl ). Linton and Seol (2007), Caner (2002), res pectively, made 
contributions. For applications to stock markets and exchange rates we refer to lAkdeniz et al 


( 200, 'll ) and Basci and Caner ( 200t)l ). These authors argue that threshold model can contribute to 
reducing forecast errors. 

To be precise, we shall study the model 


— X[j3o + X'(5o1{Q;<7-o} + Ui, 


i = 1 ,..., n 


( 1 ) 


where l3o,5o E and tq determines the location of the threshold/break. Qi determines which 
regime we are in and could be the debt level in a growth regression or education in a wage regression. 
If do = 0, there is no break and rn is n ot identified. In that case the model is linear. In a very 
insightful recent paper Lee et al. ( 20121 ) proved finite sample oracle inequalities for the prediction 
and estimation error of the (scaled) Lasso applied to ([I|) in the case of fixed regressors and gaussian 
error terms. In their simulation section, they also extend their results to random regressors with 
Gaussian errors. Furthermore, they nicely showed that tq exhibits the well known super efficiency 
phenomenon from low dimensional break point models even in the high-dimensional case. These 
authors also show that the scaled Lasso does not select too many irrelevant variables in the spirit of 
Bickel et al. ( 20091 ). However, their results are by no means trivial extensions of oracle inequalities 
for linear models as they show that the classical restricted eigenvalue condition must hold uniformly 
over the parameter space in threshold models. In addition, the probabilistic analysis is also much 
more refined than in the linear case. 

The aim of this paper is to show that it is possible to consistently decide whether a break is 
present or not even in the high-dimensional change point model with random regressors. In other 
words, we show that it is possible to decide whether do = 0 or if it possesses non-zero entries. To do 
so efficiently, we first establish an upper bound on the sup-norm convergence rate of the estimator 
d of do which is valid in even highly correlated designs. This is not an easy task as almost all 
previous work has focussed on est ablishing upper bou nds on the i-\ or io estimation error in the 
plain linear model. Exceptions are Lounici ( 2008l l and van de Geer ( 2014l i who provide sup-norm 
bounds in the high-dimensional linear model. To the best of our knowledge, we are the first to 
establish sup-norm bounds on the estimation error in a high-dimensional non-linear model. Our 
sup-norm bound is much smaller than the corresponding ii and £2 bounds on the estimation error 
as it does not depend on the unknown number of non-zero coefficients s. Thus, our approach to 
break detection, which is based on thresholding, allows for a much finer distinction between zero 
and non-zero entries of do- The result is that we can detect breaks which would be too small to 
detect if one thresholded based on classical £i or £2 estimation error. In that sense, the sharp sup- 
norm bound is tailored to break detection in our context and we strengthen the result of selecting 
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not too many irrelevant variables in the threshold model to selecting exactly the right ones with 
probability tending to one. 

The debate regarding the impact of debt on GDP growth was recently reignited by the European 
public debt crisis as well the claim by Reinhart and Eogofl ( 20101 ) that public debt has a substantial 
negative effect on future G DP growth when the ratio of debt to GDP is over 90%. Following 
Reinhart and Rogofl ( 201ol ). several authors have economet rically investigat e d the presence of such 


a threshold. Of particular interest for us is the work of ICecchetti et al.l (j2012l i who estimated 
threshold growth regressions using several measures of public and private debt as well as a set o f 
standard controls. Using our thresholded Lasso estimator with the data of ICecchetti et al.l (j2012l ') 


we find robust evidence of a threshold in the effect of debt on future GDP growth. However, the 
effect of debt being above the threshold appears to be complex. 


In Section [2l we recall the scaled Lasso estimator for threshold models of Lee et al. ( 2012l i. 
Section [3] establishes loo norm bounds for the estimation error of the scaled Lasso. This sup-norm 
bound is the basis for our new thresholded scaled Lasso estimator which is introduced in Section 
m Section [5] provides simulations supporting the selection consistency of our estimator. Section [6] 
reports the results of our growth regressions. All proofs are deferred to the appendix. 


1.1 Notation 

For any vector x G (for some k > 1), let ||x||£^ , and denote the £ 1,^2 and loo norms, 

respectively. Similarly, for any m x n matrix A, ||A||^^ , ||A||£^ and ||A||^^ denote the induced 
(operator) norms corresponding to the above three norms. They can be calculated as ||A||^^ = 

maxi<j<„ Ya=i \\Mt 2 = A/<?^max(^'^) where (/>max(-) is the maximal eigenvalue, and ||A||^^ = 

maxi<i<m respectively. We will also need ||A||^ = maxjj \Aij\ where the maximum 

extends over all entries of A. For real numbers a,b aV b and a /\b denote their maximum and 
minimum, respectively. Furthermore, the empirical norm of y G M" is given by ||y||„ = 

We shall say that a real random variable Z is subgaussian if there exists positive constants A 
and B such that P {\Z\ > t) < Ae~^^^ for all r > 0. Z is said to be subexponential if there exists 
positive constants C and D such that P (|Z| > r) < Ce~^^ for all r > 0. For x G M^, we will let 
denote its jth entry. Let ”wpal” denote with probability approaching one. 


2 Scaled Lasso for Threshold Regression 

Defining the 2m x 1 vectors A'j(r) = [X'-, and oq = (/3o,(5g)' one can rewrite ([I|) as 

Yi = XiiroYao+ Ui, i = (2) 


where tq is supposed to be an element of a parameter space T = [toTi] C R and an is s upposed 
to belong to a parameter space A C This is exactly the model that iLee et al.l (j2012l i studied 

in the case where m can be much larger than n. We shall be more specific about the probabilistic 
assumptions in Section [3Tl Let J(ao) = {j = l,...,2m : ag / 0} be the indices of the non-zero 
coefficients with cardinality | J(q:o)|- Denoting by X{t) the (n x 2m) matrix whose rows are Aj(r)', 
setting Y = (Fi, ...,iy)', and U = {Ui, ...,[/„), ([2|) can be written more compactly as 


Y = X{To)a + U 


Next, let X^^\t) denote the jth column of X{t) and define the 2m x 2m diagonal matrix 


D{t) = diag{\\X^^\T)\\n,j = 1, ...,2m} 
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Now set 


Sn{a, r) = n-i {Y^ - X'/3 - = ||y - y(r)a||2 , 

i=l 

where a = (/?', <5')' G A and define the scaled ii penalty 

2m 

X\\D{T)a\l^=XY,\\X^^\T)Ua,\, 

j=i 

where A is a tuning parameter about which we shall be explicit later. With this notation in place 
we define for each r G T 

d(r) = argmin{5n(a,r) + 2A||iA(r)a||^ } (3) 

a£A ^ 

and 


= argmin {5„(Q;(r),r) + A||y)(T)Q;(r)|L } . 

tGT 


To be precise, r is an interval and in accordance with Lee et al. ( 20121 ) we define the maximum 
of the interval as the estimator f. For every n, it suffices in practice to search over Qi,...,Qn as 
candidates for f as these are the points where l{Qj<r}) ^ = l,---,n. can change. Therefore, the 
estimator of (q!o,to) is dehned as {a,f) = 


Assuming hxed regressors and and gaussian error terms iLee et al.l (j2012l i established oracle 
inequalities for the prediction and ii estimation error of the Lasso estimator a. When a break 
is present they also established upper bounds on the estimation error of f. We contribute by 
establishing oracle inequalities in the sup-norm for this non-linear model and show that we can 

consistently detect breaks that are as small as 


3 Uniform Convergence Rate of the Scaled Lasso Estimator 


In this section we establish upper bounds on the sup norm estimation error ||d — aoU^^- As argued 
previously, and as will be made rigorous in Section IH an upper bound ||5 — is what is really 

needed for break detection purposes. However, we shall actually establish a slightly stronger result 
here which also makes it possible to efficiently select variables from the first m columns of X{tq). 
This sup-norm bound is established separately for the case where no break is present and for the 
case where a break is present. Let X and Z{t) denote the first and last m columns of X{t) for 
T G T, respectively, and define 


Tn 


min 


zfa’(to)n; 


Note that under Assumption 1 below it follows by Lemma [3] in the appendix that is bounded 
away from zero with probability tending to one. is trivially never greater than one. Now define 


/log(3m)U'^^ 

V nrn J 


(4) 
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as the tuning parameter for a constant ^ > 0. Assuming an i.i.d. sample we let S(r) = 
E{X, (r)Ai(r)') denote the population covariance matrix of the covariates. In Lemma [U below we 
give sufficient conditions for its inverse 0(r) to exist as long as S = E(XiX[) is invertible which is 
a standard assumption in regression models. Thus, the practical consequence is that the presence 
of indicator functions in the definition of Xi[t) does not make it singular. Now we introduce the 
assumptions that our theorems rely on. 


3.1 Assumptions 


In this section we recall the assumptions used bv iLee et al.l (j2012l l in their Theorems 2 and 3 which 
are used as ingredients in the proofs of our Theorems [T] and [2j To be precise, w e use the oracle 


inequalities for the h estimation errors of a and f provided bv Ihee et al.1 (120121 1. We alter their 


assumptions sligh tly, as we are w orking in a random design as opposed to their fixed regressor 


design. However, iLee et al.l (j2012l l have already argued how some of their assumptions could be 


valid in a random design and as a consequence we do note need to address these in detail. 

Assumption 1. Let {Aj, Hj, he an i.i.d. sample and let {Xi,Ui) he independent of Qi. 

Furthermore, let Qi be uniformly distributed on [0,1] and assume that all entries of Xi and Ui 

are subgaussiai^ with m.m.i<j<rn E {x[^^ ) bounded away from zero, (i) For the parameter space A 
for ao, any a = (oi, • • • , a2m) £ A C including oq, satisfies maxi<j<2m \aj\ < Ci, for some 

constant Ci > 0. In addition, tq gT = [toAi] with 0 < to < ti < 1. (m) log (m)/n —>■ 0. 


Assumption 1 is the one which has been altered the most compared to lLee et al.l (|2012l ) as the 
boundedness of certain norms of the covariates does no longer have to be assumed as this now 
follows directly from independence and subgaussianity of these. See Lemma [3| in the appendix fo r 
details. Furthermore, the absence of ties among the Qi, i = 1,..., n (as required in Lee et al. ( 2012I H 
follows in an almost sure sense from these being uniformly (and thus continuously) distributed. 

The assumption of the sample being i.i.d. can most likely be relaxed by exchanging the probabil- 
isitic inequalities used in the appendix for ones allowing for weak dependences and/or heterogeneity. 
For convenience, we have also assumed that Xi and Qi are independent. However, as the main 
contribution of this paper is to provide sup norm bounds for high-dimensional non-linear models 
as the first in the literature (to the best of our knowledge) we have chosen to keep the probabilistic 
framework simple in order not to suffocate the cardinal ideas in technicalities. 

Assumption 2. (Uniform Restricted Eigenvalue Condition). For some integer s such that 1 < 
s < 2m, a positive number cq and some set 5 C M, the following condition holds wpal 


k{s, co,S) = min 
tGS 


mm 

Jo|<s 


mm 

T'^0’lT'5p|l<co|7Jol 


|A(r)7|2 




> 0 . 


1 n 


(5) 


In the random design considered in this paper we require assumption 2 of lhee et al.l (I 2 OI 2 II above 
to be valid with probability tending to one. However, this is an unnecessarily high-level assumption 
as it can often be verified by assuming that S(r) satisfies the uniform restricted eigenvalue condition 
(which it does in particular when it has full rank - as is in turns true under Assumption 1 if S has 
full rank as argued on page A4 in Lee et aP ( 2012I H and by showing that ^X '(t)X(t) is uniformly 
close to H(r). Mimicking the arguments on pages A3-A6 in Lee et al. ( 2012l l it can be shown that 
(j5|) above holds with probability tending to one under our Assumption 1 as long as S has full rank 


^The notation suppresses that we are really dealing with a triangular array. Thus, more precisely, we assume 
uniform subgaussianity across the rows of this triangular array. 
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- a rather innocent assumption. Thus, Assumption 2 is almost automatic under Assumption 1 and 
we shall use this in the statements of Theorems [1] and [2] below. 

For the next assumption, define fa,T{x-,q) = x'/3 + and fo{x,q) = x'/3o + 

and let m{a) denote the number of non-zero elements of a. 

Assumption 3. (Identifiability under Sparsity and Discontinuity of Regression). For a given 
s > I J(ao)|; O'lxd for any p and r such that \t — tq\ > rj > miuj \Qi — tq\, and a G {a : m{a) < s} 
there exists a constant c > 0 such that, wpal 


\\fa,T - /olln > Cr?, 


For this assumption Lee et all ( 20121 ) (pages A7-A8) also provide sufficient conditions 
passing the assumptions made in Assumption 1 above. 


encom- 


Assumption 4. (Smoothness of Design). For any p > 0, there exists a constant C < oo such that 
wpal 


1 

sup sup 

l<j,k<m\T-To\<r] ^ 


U) I 


|l{Qi<To} - l{Qi<r}l < Cp. 


Lee et al. ( 2012l i argue that this is the case when the Qi are continuously distributed and 
\Qi = t) is continuous and bounded in a neighborhood of tq for all 1 < j,k < m. 




Note however, that the outer supremum in Assumption 4 above is taken over all 1 < j,k < m 

- - I (i) (k)\ (j)‘^ 

as opposed to only 1 < j < m in ILee et al.l (|2012l l as \Xl X) \ has replaced X) . This slight 

strengthening of the assumption is needed to establish an £00 bound on the estimation error of a 

in the case where a structural break is present (Theorem [2] below). 

Assumption 5. (Well defined second moments). For any p such that 1/n < rj < rjQ, h^^q) is 
bounded where wpal 

min{[n{To+r])],n} 


Kid) = 


1 


2nq 


E 


{X'AY 


2=max{l,[n(ro—77)]} 

where [.] denotes the integer part of a real number. 


Finally, we also need to impose the same technical regularity condition as iLee et ahl (120121 ) 
which they denote Assumption 6 and present on page A23 of their paper. This assumption is 

satisfied asymptotically in our context when s H^ollf^ C). Since maxi<j<m < Ci by 

Assumption 1 above this is in turns true when s| J((5o)| log(m)^/^/vTi —)• 0. The latter assumption 
will be assumed in T heorem [2] below (as we also need it for another purpose) and thus Assumption 
6 in Lee et al. ( 2012l i is automatic in our case. 


3.2 sup-norm rate of convergence of a 

We next turn to providing upper bounds on the foo estimation error of a. We distinguish between 
the case in which no break is present and the case in which a break is present. 

Theorem 1. Suppose that (5o = 0 and let Assumptions 1 be satisfied. Furthermore, let | J(a)| < s, 
assume that S has full rank and that 0(t) = S“^(r) satisfies sup.,-g 2 ’||©(''’)< 00 . Then, 

choosing A as in 0 and assuming ^ 0, one has 

||d - ao||^^ = Op = OpiX). 
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Thus, a fortiori, we also have ||5 — (JoH^ = Op = Op{X). 

Theorem [1] provides the stochastic o rder of th e ir^ e stimation error of a for the case where 
no break is present. From Theorem 1 in Lee et ah ( 20121 ) (ignoring that their results are for non- 


random regressors) one can conclude that ||d — aoU^^ = Op (sy^log(m)/n). Prom this, one can of 
course also conclude that ||q; — < ||« ~ = Op (s-\/log(m)/n). However, our Theorem[T] 

shows that this rate is much too large as s may be almost as large as 0{y/n) without obstructing 
ii norm consistency. Our much smaller bound will allow for more precise thresholding in Section 

H 

We stress again that almost all research in high-dimensional models so far has focussed exclu¬ 
sively on providing upper bounds on the and (. 2 - ^oo bounds on the estimation error have been 


establ ished for the Lasso in the plain linear regression model by iLounicil (j2008l ) and Ivan de Geer 
( 2014li . However, ;o the best of our knowledge we are the first to establish sup-norm bounds for 
high-dimensional non-linear models, and certainly in the threshold model. As we shall see below, a 
sup-norm bound will yield much more precise variable selection results for the thresholded scaled 
Lasso than thresholding based on ii or ^2 bounds since the latter two are larger due to the presence 
of the unknown sparsity s. Next, consider the case where (io 7 ^ 0, i.e. a break is present. 


Theorem 2. Suppose that 5o / 0 and let Assumptions 1 and 3-5 be satisfied. Furthermore, let 

< 00 . Then, choosing A as in 0 and 

0 , one has 


|J(a)l < s, assume that S has full rank and that ||0 (to)||^ 
assuming s| J((5o)| 


log(™) 


\a — ao L 

I 11 to 


= 0 . 


log(m) 


n 


Thus, a fortiori, we also have ||<5 — = Op ~ Op{X). 

The results of Theorem [2] are similar to those in Theorem [1] but the assumptions differ. First, 
||0(r)||^ only has to be bounded at tq instead of uniformly over T = [toTi] for 0 < to < H < 1- 
Lemma [T] below shows that sup.,-g 2 ^ 00 in the equicorrelation 

design but of course with the former being no smaller than the latter. More importantly, requiring 

s| J((5o)| fog(m)foV\Ai —>• 0 is in general more restrictive than requiring —>• 0 as in 

Theorem [TJ However, if the number of coefficient which break is bounded, i.e. | J(5o)| < B for an 
absolute constant B, then the rate requirement of Theorem [2] is actually slightly weaker than the 
one in Theorem [TJ 

The following Lemma shows that even when the covariates are highly correlated, exists and 
the assumptions sup.,-g 7 i|| 0 (r)||^ < 00 and ||0('ro)||£ < cc from Theorems [U and [2j respectively, 

are satisfied. First, recall the definition of an equicorrelation design. 

Definition 1. We say that S is an equicorrelation matrix if 

/1 p ■■■ p\ 


\p P ■■■ ij 

for some —1 < p < 1. 
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Lemma 1. Let be an iid sample and assume that Ui is uniformly distributed on [0,1]. 

Let S = E{XiX[) be an m X m equicorrelation matrix with 0 < p < 1. Then exists and 
for all T G (0,1) one /las ||0(r)||^ < (2 V If, furthermore, T = for some 

0 < to < ti < 1, then sup^g 7 ^|| 0 (r)||^ is bounded by a constant only depending on p. 

Lemma [T] states that ||0('7‘)||£ is bounded for all r G (0,1) even when the correlation is 
arbitrarily close to, but different from, one. r can not be zero or one since in that case S(r) would 
be singular. From a modeling point of view this excludes breaks at the very endpoints of the sample 
which is a standard assumption in the literature. 


4 Thresholded Scaled Lasso 


In this section we utilize the ioo bound established in Theorems [T] and [ 2 ] above to provide sharp 
thresholding results for the Scaled Lasso estimator. Recall that these theorems established that 
I|q^ ~ Q^oIL < CA with arbitrarily large probability, irrespective of whether a break is present or not, 
by choosing C sufficiently large. Before showing that the breaks can be revealed consistently we shall 
provide a slightly more general result stating that the truly zero coefficients can be distinguished 
from the non-zero ones. First, define the Thresholded Scaled Lasso estimator as 


aj 

0 


if 

if 


\a 




\aA < H 


( 6 ) 


where H is the threshold determining whether a coefficient should be classified as zero or non-zero. 
In particular, we shall see that choosing H = 2CX results in consistent model selection. Here 
we stress once more that our threshold is much sharper than what would have been obtainable 


if we had directly used that lid — aolL < CsX with probability tending to one from iLee et al 

I '' \\t\ 

((2012). Thus, it is important to have an (.^o bound on the estimation error as this allows for a 


much finer distinction between the zero and the non-zero coefficients than would been possible from 
the usual Ii or (.2 bounds. To be precise, let aoj be a nonzero coefficient such that \aoj\/X — >■ 00 
but |aoj|/(sA) —)• 0. Not that there may be a considerable wedge between |aoj|/A and |Q:oj|/(sA) 
as s can be almost as large as y/n such that this is a setting of practical relevance. Such an 
aop will correctly be classified as non-zero when thresholding at the level A (resulting from an ^00 
bound) while it would wrongly be classified as zero when thresholding at the level sX (resulting 
from a plain ii bound). This example underscores the importance of establishing bounds as 
in Theorems [U and [2] prior to thresholding. Next, recall that J{ao) = {j = l,...,2m : aoj / 0} 
and define J(d) = {j = 1,...., 2m : aj 7 ^ 0}. The following theorems establish the properties of the 
thresholded scaled Lasso and rely crucially on the ioo bounds on the estimation error established 
in Theorems [U and [2] above. 


Theorem 3. Let the assumptions of Theorems[^and\^be satisfied and assume that minjgj(„g) |aoj| > 

3(7A. Then, for all e > 0 there exists a C such that for H = 2(7A = 2Cone has 
P{j{a) = J{ao))>l-e. 


Theorem [3] states that consistent model selection is possible with the thresholded Lasso in 
the nonlinear break point regression model as long as the non-zero coefficients are at least of the 

order y . This is considerably sharper than thresholding based on ii estimation errors where 


consistent variable selection would require the non-zero coefficients to be at least of orde r ■ 

The idea in the proof of Theorem [3] is similar to the one for the linear case in Lounici ( 20081 ). 


7 




















Note that if one is only interested in finding out whether there is a break or not, i.e. whether 
6q is non-zero or not, one can simply threshold 5 only according to the rule in ([ 6 ]). Defining 
J{5o) = {j = 1, ...,m : 5oj 7 ^ 0} and J{6) = {j = 1, : 5j 7 ^ 0} we have the following result on 

consistent break detection. 

Theorem 4. Let the assumptions of Theorems{^and\^be satisfied and assume thatminij^j^^So) l^ojl > 
3CX. Then, for all e > 0 there exists a C such that for H = 2(7A = 2Cone has 
P {jC 6) = J{6o)) >l-e. 

Break selection consistency is weaker than model selection consistency as it only requires classi¬ 
fying 6q correctly. However, it is still relevant as it answers the question whether a break is present 
or not. We discuss how to choose the threshold parameter C in practice in Section [5j 


5 Simulations 


In this section we report the results of a series of simulation experiments evaluating the finite 
sample properties of the thresholded scaled Lasso. We shall consider performance along the dimen¬ 
sions: increasing number of irrelevant variables, estimation in the absence of a threshold, increasing 
number of observations, scale of the parameters, and increasing number of non-zero variables. 

The regressors are generated as W ~ AA(0,/), the threshold variable Qi ~ ^^[0,1], and the 
innovations Ui ~ AA(0, fi^) where we set the residual variance = 0.25, i = l,...,n. When the 
threshold parameter tq is not explicitly stated it is set to tq = 0.5; we se arch for rn over a grid from 
0.15 to 0.85 by steps of 0.05. This grid is coarser than the grid used in lLee et al.l (j2012l ) which, in 
our experience, has a mild detrimental effect on the precision with which tq is estimated but not 
on other measures of the quality of the estimator while substantially reducing computation time, 
thus allowing us to carry out more replications. We select the thresholding parameter C by BIC 
using a grid from 0.1 to 5, so that parameters smaller (in absolute value) than C\ are set to zero 
by the thresholded scaled Lasso. 

Every model is estimated with an intercept so that we estimate 2m -\- 1 parameters, plus the 
threshold parameter tq. All t he results reported below are bas ed on 1000 replications. The 
simulation are c arried with R ( R Development Core Team . 20081 ) using the glmnet package of 
Friedman et ID (l2010l i. The resu lts (and those of the empirical application in section [6]) can be 


replicated using knitr ( Xie . 20141 ) and the supplementary materiao 
We report the following statistics, averaged across iterations. 


• MSE: mean square prediction error. 

• I J(q;) n J{aQy\: number zero parameters incorrectly retained in the model. 

• I J(ao) n J(q:)‘^|: number of non-zero parameters excluded. 

• Perfect Sel.: the share (in %) of iterations for which we have perfect model selection. 

• lid — Oo 11 ]^: estimation error for the parameters. 

• ||d — aolloo: ^00 estimation error for the parameters. 

• 1 "^ ~ absolute threshold parameter estimation error. 


^Available at https://github.com/lcallot/ttlas 





























C: selected (BIC) thresholding parameter. 
A: selected (BIC) penalty parameter. 
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0.06 

3 

3.39 

0.85 

0.27 

- 

0.06 



1.22 

0.04 

0.11 

86 

3.29 

0.85 

- 

1.46 

- 

m = 

50 

0.4 

1.42 

5.23 

0.07 

1 

3.68 

0.94 

0.22 

- 

0.05 

1.45 

0.04 

0.16 

81 

3.55 

0.94 

- 

1.49 

- 



0.5 

1.55 

5.72 

0.07 

0 

3.99 

1.00 

0.18 

- 

0.05 



1.58 

0.06 

0.14 

82 

3.85 

1.01 

- 

1.45 

- 



0.3 

1.34 

5.59 

0.05 

1 

3.99 

0.95 

0.25 

- 

0.07 



1.38 

0.04 

0.13 

85 

3.86 

0.95 

- 

1.27 

- 

m = 

100 

0.4 

1.56 

6.26 

0.08 

0 

4.29 

1.03 

0.22 

- 

0.07 

1.60 

0.05 

0.16 

82 

4.15 

1.03 

- 

1.25 

- 



0.5 

1.77 

7.27 

0.12 

0 

4.77 

1.10 

0.19 

- 

0.07 



1.83 

0.07 

0.21 

78 

4.60 

1.11 

- 

1.22 

- 



0.3 

1.57 

7.06 

0.10 

0 

4.65 

1.06 

0.25 

- 

0.09 



1.62 

0.03 

0.19 

82 

4.49 

1.06 

- 

1.15 

- 

m = 

200 

0.4 

1.80 

8.10 

0.12 

0 

5.04 

1.14 

0.22 

- 

0.09 

1.87 

0.03 

0.22 

79 

4.86 

1.15 

- 

1.12 

- 



0.5 

2.22 

9.20 

0.26 

0 

5.82 

1.27 

0.18 

- 

0.09 



2.30 

0.06 

0.40 

71 

5.60 

1.28 

- 

1.07 

- 



0.3 

1.73 

8.81 

0.15 

0 

5.38 

1.16 

0.26 

- 

0.10 



1.81 

0.03 

0.23 

81 

5.18 

1.17 

- 

1.04 

- 

m = 

400 

0.4 

2.16 

9.35 

0.33 

0 

6.17 

1.30 

0.22 

- 

0.12 

2.26 

0.04 

0.47 

73 

5.94 

1.31 

- 

0.98 

- 



0.5 

2.84 

9.81 

0.66 

0 

7.26 

1.46 

0.19 

- 

0.13 



2.96 

0.03 

0.91 

60 

7.02 

1.47 

- 

0.90 

- 


Table 1: Lasso (white background) and Thresholded Lasso (grey background). Increasing number 
of zero parameters and 3 locations of tq. 

Table [T] contains the results of experiments where we consider 4 different dimensions for the 
parameter vectors and multiple locations for tq. The data is generated as follows: 

• Sample size: n = 200, /3 = [2, 2, 2, 2, 2, 0,..., 0], 5 = [2,-2, 2,-2, 2,0,..., 0]. 

• The length (3 and S is m = 50,100, 200,400. 

The most important finding in Table [T] is that across all settings the scaled Lasso almost never 
detects the true model while its thresholded version does so very often and rather consistent across 
the settings. As expected, the scaled Lasso does a good job at model screening in the sense that 
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it retains all relevant variables in many instances. However, it often fails to exclude irrelevant 
variables. This is exactly where the thresholding sets in - it weeds out the falsely retained variables 
by the first step scaled Lasso. To illustrate this, consider the setting of m = 400 and tq = 0.5. Here 
the scaled Lasso includes almost ten irrelevant variables on average while its thresholded version 
includes as few as 0.03 irrelevant variables on average. Note also how the £oo estimation error is 
much lower than the counterpart confirming our theoretical results from Theorems [T] and O thus 
allowing for much sharper thresholding than usual. This important finding is conhrmed in all of 
the other settings below. 

Perfect model selection seems to be slightly easier for lower values of the threshold parameter tq. 
On the other hand, f becomes less precise as tq is lowered. All other measures in general improve 
slightly when tq is lowered. Increasing the dimension of the model, m, worsens most performance 
measures except for the estimation error of f which stays constant. Finally, in larger models more 
penalization is applied as can be seen from the larger choice of A as m is increased. 

Table [2] considers the case where no threshold effect is present, 5o = 0, the exact data generating 
process is: 

• Sample size: n = 200, /3 = [2, 2,2, 2,2,0,..., 0], 5 = [0,..., 0]. 

• The length (3 and 6 is m = 50,100, 200,400. 





0 ^ 

V 

c 

0 ^ 

V 

c 

O 

/ 

/ 

C 

A 

m 

= 50 

0.29 

1.56 

0.00 

23 

0.60 

0.16 

- 

0.07 

0.29 

0.21 

0.00 

81 

0.56 

0.16 

0.73 

- 

m 

= 100 

0.30 

1.56 

0.00 

23 

0.65 

0.17 

- 

0.08 

0.31 

0.18 

0.00 

83 

0.61 

0.17 

0.61 

- 

m 

= 200 

0.31 

1.45 

0.00 

27 

0.70 

0.18 

- 

0.09 

0.32 

0.15 

0.00 

86 

0.66 

0.18 

0.53 

- 

m 

= 400 

0.32 

1.44 

0.00 

27 

0.74 

0.19 

- 

0.10 

0.33 

0.12 

0.00 

89 

0.71 

0.19 

0.46 

- 


Table 2: Lasso (white background) and Thresholded Lasso (grey background). No threshold effect 
{6 = 0), n = 200, 4 different length of the parameter vector. 

The main finding of Table [2] is that almost all performance measures improve drastically com¬ 
pared to Tabled) This is the case in particular for large m as the performance is no longer worsened 
as m increases. Note, for example, that the MSE and ii estimation error of a are almost ten times 
lower for m = 400 than they were in Tabled) Most importantly for us, the perfect models selection 
percentage is now also stable across m. 

In order to investigate the asymptotic properties of our procedure. Table [3) reveals the effect of 
increasing the sample size for two values of tq. The exact DGP is: 

• Sample size: n = 50,100, 200,500,1000. 

. /3 = [2,2,2,2,2,0,...,0], 5 = [2,-2, 2,-2, 2,0,..., 0]. 
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, d 

V 

c 

V 

c 

'pkj 

O 

/ 

/ 

/ 

C 

A 



n 

= 50 

10.04 

1.83 

4.92 

0 

14.72 

1.99 

0.30 

- 

0.58 



10.64 

0.29 

5.51 

0 

14.66 

1.99 

- 

0.67 

- 



n 

= 100 

3.34 

7.22 

1.09 

0 

7.92 

1.51 

0.27 

- 

0.15 



3.53 

0.12 

1.38 

45 

7.63 

1.51 

- 

1.32 

- 

To 

= 0.3 

n 

= 200 

1.46 

5.56 

0.08 

1 

4.07 

1.00 

0.25 

- 

0.07 

1.50 

0.04 

0.16 

82 

3.95 

1.00 

- 

1.25 

- 



n 

= 500 

0.76 

3.31 

0.01 

6 

2.27 

0.64 

0.17 

- 

0.04 



0.76 

0.01 

0.02 

97 

2.23 

0.64 

- 

0.95 

- 



n 

= 1000 

0.50 

2.62 

0.00 

10 

1.51 

0.45 

0.06 

- 

0.03 



0.50 

0.00 

0.01 

98 

1.49 

0.45 

- 

0.81 

- 



n 

= 50 

8.98 

1.81 

4.84 

0 

14.56 

2.00 

0.21 

- 

0.48 



9.52 

0.24 

5.43 

0 

14.48 

2.00 

- 

0.62 

- 



n 

= 100 

4.73 

5.41 

2.15 

0 

10.05 

1.75 

0.20 

- 

0.21 



4.94 

0.12 

2.62 

23 

9.84 

1.75 

- 

1.00 

- 

To 

= 0.5 

n 

= 200 

1.83 

7.41 

0.12 

0 

4.83 

1.14 

0.18 

- 

0.07 

1.89 

0.06 

0.21 

78 

4.66 

1.14 

- 

1.22 

- 



n 

= 500 

0.86 

4.32 

0.01 

2 

2.53 

0.69 

0.18 

- 

0.04 



0.87 

0.01 

0.04 

96 

2.48 

0.69 

- 

0.96 

- 



n 

= 1000 

0.55 

3.27 

0.00 

8 

1.70 

0.49 

0.08 

- 

0.03 



0.55 

0.01 

0.01 

98 

1.67 

0.49 

- 

0.80 

- 


Table 3; Lasso (white background) and Thresholded Lasso (grey background). Increasing sample 
size with m = 100 and 2 locations of tq. 
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As expected, the probability of correct model selection tends to one for the thresholded scaled 
Lasso. For the plain scaled Lasso, on the other hand, this probability reaches at most 11%. As 
seen already in Table [H the problem that the scaled Lasso suffers from is false positives - it fails to 
exclude irrelevant variables even as the sample size increases. Finally, and as expected, the penalty 
applied (A) decreases as n increases. 

Table m considers different values of the non-zero coefficients to investigate the effect of the scale 
of these coefficients. The data is generated as: 

• Sample size: n = 100, 200. 

. /3 = a[l,l,l,l,l,0,...,0], 5 = a[l,-l,l,-l,l,0,...,0]. 

• a = 0.3,0.5,1, 2 is the scale of the non zero parameters. 







0^ 
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0^ 
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c 

7:5 
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/ 
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A 



n = 

100 

0.50 

0.41 

5.50 

0 

2.27 

0.30 

0.28 

- 

0.15 



0.52 

0.02 

6.22 

0 

2.30 

0.30 

- 

0.46 

- 

a = 

0.3 

n = 

200 

0.38 

0.29 

4.00 

0 

1.89 

0.30 

0.32 

- 

0.10 

0.39 

0.01 

4.56 

0 

1.91 

0.30 

- 

0.44 

- 



n = 

1000 

0.31 

0.60 

1.74 

1 

1.38 

0.30 

0.10 

- 

0.04 



0.31 

0.00 

2.21 

5 

1.38 

0.30 

- 

0.51 

- 



n = 

100 

0.75 

0.57 

4.49 

0 

3.43 

0.50 

0.25 

- 

0.15 



0.78 

0.03 

5.15 

0 

3.45 

0.50 

- 

0.47 

- 

a = 

0.5 

n = 

200 

0.57 

0.50 

3.21 

0 

2.92 

0.50 

0.27 

- 

0.10 

0.58 

0.01 

3.95 

0 

2.93 

0.50 

- 

0.48 

- 



n = 

1000 

0.31 

2.75 

0.04 

9 

1.37 

0.32 

0.10 

- 

0.03 



0.31 

0.00 

0.06 

94 

1.35 

0.32 

- 

0.75 

- 



n = 

100 

1.87 

1.12 

3.52 

0 

6.31 

1.00 

0.22 

- 

0.18 



1.94 

0.05 

4.21 

0 

6.31 

1.00 

- 

0.56 

- 

a = 

1 

n = 

200 

1.09 

3.95 

1.16 

0 

4.46 

0.86 

0.21 

- 

0.09 

1.12 

0.04 

1.54 

39 

4.39 

0.86 

- 

0.88 

- 



n = 

1000 

0.34 

2.98 

0.00 

9 

1.43 

0.35 

0.08 

- 

0.03 



0.34 

0.00 

0.01 

99 

1.41 

0.35 

- 

0.83 

- 



n = 

100 

4.68 

5.32 

2.12 

0 

10.01 

1.76 

0.20 

- 

0.21 



4.89 

0.10 

2.61 

21 

9.80 

1.76 

- 

1.02 

- 

a = 

9 

n = 

200 

1.81 

7.44 

0.11 

0 

4.74 

1.12 

0.18 

- 

0.07 


1.87 

0.05 

0.21 

78 

4.57 

1.12 

- 

1.23 

- 



n = 

1000 

0.56 

3.18 

0.00 

7 

1.70 

0.49 

0.07 

- 

0.03 



0.56 

0.01 

0.01 

98 

1.68 

0.49 

- 

0.79 

- 


Table 4: Lasso (white background) and Thresholded Lasso (grey background). Increasing param¬ 
eter scale, 3 sample sizes, tq = 0.5. 
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When these are as small as 0.3 perfect model selection does not seem possible unless when 
n = 1000. On the other hand, the number of relevant variables excluded clearly decreases as n is 
increased. In general, no matter what the value of the non-zero coefficients are, all performance 
measures improve as n is increased, thus confirming the findings in Table [3l While variable selec¬ 
tion is easier when the non-zero coefficients are well-separated from the zero ones, the MSE and 
estimation error of a actually improve as the non-zero coefficients become smaller. The reason for 
this is that falsely classifying a non-zero coefficient as zero is less costly in terms of estimation error 
when this coefficient is already close to zero than when it is far from zero. On the other hand, f is 
estimated slightly more precisely as the non-zero coefficients become more separated from the zero 
ones. 
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To 

= 0.3 

0.34 

0.10 

0.00 

90 

0.36 

0.25 

0.25 

- 

0.09 



0.34 

0.01 

0.00 

99 

0.35 

0.25 

- 

0.23 

- 

mi 

= 1 

To 

= 0.4 

0.35 

0.09 

0.00 

91 

0.36 

0.26 

0.22 

- 

0.09 

0.35 

0.01 

0.00 

99 

0.36 

0.26 

- 

0.20 

- 



To 

= 0.5 

0.36 

0.10 

0.00 

90 

0.37 

0.27 

0.23 

- 

0.09 



0.36 

0.01 

0.00 

99 

0.37 

0.27 

- 

0.19 

- 



To 

= 0.3 

1.84 

1.11 

0.19 

31 

2.59 

0.90 

0.19 

- 

0.08 



1.86 

0.01 

0.28 

78 

2.57 

0.91 

- 

0.68 

- 

mi 

= 5 

To 

= 0.4 

2.03 

1.11 

0.20 

32 

2.67 

0.93 

0.18 

- 

0.08 

2.05 

0.02 

0.30 

76 

2.64 

0.93 

- 

0.63 

- 



To 

= 0.5 

2.05 

1.01 

0.16 

35 

2.57 

0.91 

0.17 

- 

0.08 



2.06 

0.02 

0.27 

79 

2.55 

0.91 

- 

0.61 

- 



To 

= 0.3 

5.08 

2.85 

0.81 

5 

6.54 

1.36 

0.19 

- 

0.08 



5.12 

0.06 

1.06 

51 

6.48 

1.36 

- 

1.09 

- 

mi 

= 10 

To 

= 0.4 

4.84 

2.68 

0.66 

7 

6.17 

1.28 

0.18 

- 

0.08 

4.88 

0.05 

0.89 

57 

6.11 

1.28 

- 

1.01 

- 



To 

= 0.5 

5.05 

2.56 

0.65 

7 

6.10 

1.25 

0.18 

- 

0.08 



5.09 

0.04 

0.90 

59 

6.05 

1.25 

- 

0.95 

- 



To 

= 0.3 

19.93 

9.92 

4.35 

0 

23.72 

1.87 

0.20 

- 

0.07 



20.27 

0.31 

5.56 

10 

23.45 

1.88 

- 

2.45 

- 

mi 

= 25 

To 

= 0.4 

19.13 

10.32 

3.56 

0 

22.76 

1.88 

0.20 

- 

0.07 

19.48 

0.30 

4.63 

11 

22.46 

1.88 

- 

2.31 

- 



To 

= 0.5 

18.32 

9.90 

3.05 

0 

21.53 

1.75 

0.23 

- 

0.07 



18.62 

0.30 

3.94 

19 

21.25 

1.75 

- 

2.09 

- 


Table 5: Lasso (white background) and Thresholded Lasso (grey background). Increasing number 
of non zero parameters ( mi ), fixed number of zeros (niQ = 100), and 3 locations of tq. 


Finally, Table [5] investigates the effect of reducing the sparsity of the model, i.e. of increasing 
the number of non-zero coefficients. 
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• Sample size: n = 200. 

. /? = [2,...,2,0,...,0], <5 = [2,...,2,0,...,0]. 

• (5 and 5 contain both mo = 100 parameters equal to zero. 

• (5 and 5 contain both mi = 1, 5,10,50 parameters equal to 2. 

• The length /3 and 5 is m = mg + mi. 

Irrespective of the value of tq, perfect model selection becomes harder as the number of relevant 
variables increases. As our theory is based on the assumption of sparsity, this is not a surprising 
finding. The MSE and estimation error of a also increase by a lot while the estimation error of f 
is virtually unaffected by the number of relevant variables. Notice that the threshold parameter, 
C, increases drastically as the number of non-zero coefficients increases. The explanation for this 
is that thresholding seeks to avoid excluding one of the many relevant variables by setting the 
threshold higher as there are now more relevant variables at risk of being exluded. 


6 Application 


This application aims at investigating the presence of a threshold in the effect of debt on future 
GDP growth. The academic discussion regarding the impact of debt on growth, and the existence 
of a threshold above wh i ch deb t becomes severely detrimental to future growth, has been reignited 
by iReinhart and Roeoffl (1201011 who provided evid ence for the existence of such a threshold. Th e 
evidences presented by Reinhart and Rogoff ( 20101 ) have been challenged by Herndon et al. ( 2014 1 


but o t hers have put forth supportive evidence s for this thesis, see among others ICecchetti et al 
( 2OI2I ): Caner et al. ( 2O10l ): Baum et al. ( 201.'ll b 


6.1 Data 


We use the data made available by Cecchetti et al. (2012 1 which originates mainly from the IMF 
and OECD data bases. The data contains four measures of debt-to-GDP ratio for: 


1. Government debt, 

2 . Corporate debt, 

3. Private debt (corporate -(- household), 

4. Total (non financial institutions) debt (private + government). 


Notice that pri vate and total debt are aggregate measures of debt. 

The data of ICecchetti et al.l (j2012l l also contains a measure of household debt that we drop as 
the series is incomplete. A set of control variables, composed of standard macroeconomic indicators, 
is also included in the data. 


1. GDP: The logarithm of the per capita GDP. 

2. Savings: Gross savings to GDP ratio. 


®The original data is available at http://www.bis.org/publ/work352.htm, and can also be found in the replica¬ 
tion material for this section. 
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3. APop: Population growth. 

4. School: Years spent in secondary education. 

5. Open: Openness to trade, exports plus imports over GDP. 

6 . ACPI: Inflation. 

7. Dep: Population dependency ratio. 


LL: Ratio of liquid liabilities to GDP. 


9. Crisis : An indicator for banking crisis in the subsequent 5 years. This is taken from iReinhart and Rogoff 

(|2nid l. 


The data is observed for 18 countrie^ from 1980 to 2009 at an annual frequency. We lose one 
observation at the start of the sample due to first differencing and five at the end of the sample 
due to computing the 5 years ahead average growth rate, so that the full sample is 1981-2004. The 

(1201 2 I I. 


details on the construction of each variables can be found in Gecchetti et al 


6.2 Results 


In order to evaluate the impact of debt on growth, as well as the potential pr e sence of a threshold 
in this effect, we estimate a set of growth regressions. As in ICecchetti et ahl (120121 ) our left hand 
side variable is the 5 years forward average rate of growth of per capita GDP. Even though our 
estimator is not a panel estinrator w e choose to pool the data so as to make our results comparable 
with those of Gecchetti et al. ( 2012l l and benefit from a larger sample. 

We report a first set of results focusing on the impact of government debt on future GDP growth 
in Table [6l We consider 3 different samples: 1981 to 2004 (full sample, 414 observations), 1990 to 
2004 (252 observations), and a sample with no overlapping data (5 year^, 90 observations). For 
the full sample we report results for models estimated with and without country specific dummies 
(noted FE in the tables). We do not report the estimated parameters associated with the country 
specific dummies. 

We estimate the models including every control variable and a single debt measure, that is, 23 
parameters to estimate (11 parameters in /3,11 parameters in 5, and the threshold parameter r) 
including the intercept and the thresholded intercept plus, in some instances, 17 country specific 
dummies. The country specific dummies are not penalized. The grid of threshold parameters goes 
from the 15*^ to the 85^^ centiles of the threshold variable by steps of 5 centiles. We select the 
thresholding parameter C by BIG using a grid from 0.1 to 5, so that parameters smaller (in absolute 
value) than C\ are set to zero by the thresholded scaled Lasso. 

Table [6] reports the estimated parameters for the 4 specifications of the model, all including 
government debt. The L and T in the header of the table indicates a scaled Lasso estimate (/3, 5) 
or thresholded scaled^ Lasso estimate (/3, 5). The upper panel of each table reports /3 and (3, the 
middle panel <5 and 5, and the lower panel give the values of r, A, and C. Recall that the effect 
of the regressors when the threshold variable is below its threshold is given by /3 + 5 (/3 + (5) while 
the effect when the threshold variable is above its threshold is given by /3 (/3) for the scaled Lasso 
(thresholded scaled Lasso). 


^US, Japan, Germany, the United Kingdom, France, Italy, Canada, Australia, Austria, Belgium, Denmark, 
Finland, Greece, the Netherlands, Norway, Portugal, Spain, and Sweden. 

® 1984,1989,1994,1999,2004. 
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Threshold: 

Government 

Government 

Government 

Government 



L 

T 

L 

T 

L 

T 

L 

T 


intercept 

42.43 

42.43 

79.611 

79.611 

86.416 

86.416 

136.988 

136.988 


GDP 

-3.643 

-3.643 

-7.419 

-7.419 

-7.495 

-7.495 

-11.621 

-11.621 


Savings 

-0.035 

-0.035 

0.033 

0.033 

0.02 

0.02 




APop 

-1.692 

-1.692 

-1.493 

-1.493 

-0.879 

-0.879 

-0.813 

-0.813 


School 

0.426 

0.426 

0.507 

0.507 

0.095 

0.095 

-0.082 

-0.082 

/3 

Open 

0.003 


0.026 


0.024 

0.024 

0.037 

0.037 


ACPI 

-0.061 

-0.061 

-0.056 

-0.056 

-0.157 

-0.157 

-0.252 

-0.252 


Dep 

-0.091 

-0.091 

-0.104 

-0.104 

-0.132 

-0.132 

-0.22 

-0.22 


LL 

-0.433 

-0.433 

0.33 

0.33 

0.574 

0.574 

0.631 

0.631 


Crisis 

-1.277 

-1.277 

-1.58 

-1.58 

-0.949 

-0.949 

-1.396 

-1.396 


Government 

-0.713 

-0.713 





-0.518 

-0.518 


intercept 

GDP 

-12.167 

-12.167 

-1.504 

-1.504 






Savings 

0.087 

0.087 

-0.037 


-0.052 

-0.052 

0.008 



APop 

1.563 

1.563 

0.42 

0.42 

0.222 

0.222 

0.61 

0.61 


School 

-0.077 

-0.077 



0.203 

0.203 

0.098 

0.098 

<5 

Open 

ACPI 

-0.006 


0.007 


0.012 





Dep 

0.181 

0.181 



-0.035 

-0.035 




LL 

0.827 

0.827 

0.909 

0.909 






Crisis 

-0.459 

-0.459 

-0.294 

-0.294 

-1.338 

-1.338 




Government 

1.762 

1.762 

1.471 

1.471 



-3.23 

-3.23 


r 

0.82 

0.82 

0.68 

0.68 

0.59 

0.59 

0.65 

0.65 


A 

0.007 

0.007 

0.015 

0.015 

0.007 

0.007 

0.008 

0.008 


C 

- 

0.1 

- 

0.3 

- 

0.1 

- 

0.1 


Sample 

1981 - 

2004 

1981 - 

2004 

1990 - 

2004 

No overlap 


PE 

X 


/ 


/ 


/ 



Table 6; 4 specifications with government debt included as threshold variable and regressor. Esti¬ 
mated parameters for the Lasso (L) and Thresholded Lasso (T). Empty cells are parameters set to 
zero, dashes indicate parameters not included in the model. 
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Threshold: 

Corporate 

Private 

Total 



L 

T 

L 

T 

L 

T 


intercept 

140.097 

140.097 

126.236 

126.236 

134.725 

134.725 


GDP 

-11.642 

-11.642 

-10.616 

-10.616 

-11.396 

-11.396 


Savings 

-0.026 

-0.026 

-0.031 

-0.031 

-0.011 

-0.011 


APop 

-1.063 

-1.063 



-0.995 

-0.995 


School 

-0.172 

-0.172 



-0.132 

-0.132 

/3 

Open 

0.053 

0.053 

0.041 

0.041 

0.047 

0.047 

AGPI 

-0.204 

-0.204 

-0.19 

-0.19 

-0.166 

-0.166 


Dep 

-0.242 

-0.242 

-0.191 

-0.191 

-0.235 

-0.235 


LL 

0.332 

0.332 

0.316 

0.316 

0.376 

0.376 


Grisis 

-0.96 

-0.96 

-0.319 

-0.319 

-0.943 

-0.943 


Gorporate 

0.491 

0.491 

- 

- 

- 

- 


Private 

- 

- 

-0.968 

-0.968 

- 

- 


Total 

- 

- 

- 

- 

0.284 

0.284 


intercept 

GDP 

8.261 

8.261 

2.301 

2.301 




Savings 

-0.243 

-0.243 

0.022 

0.022 




APop 

-2.154 

-2.154 

-1.1 

-1.1 

2.387 

2.387 


School 

-0.29 

-0.29 

-0.33 

-0.33 

0.387 

0.387 


Open 



-0.007 


0.063 

0.063 

0 

AGPI 

-0.032 

-0.032 

-0.082 

-0.082 

0.777 

0.777 


Dep 





-0.192 

-0.192 


LL 

1.175 

1.175 

0.365 

0.365 




Grisis 

-2.389 

-2.389 

-1.167 

-1.167 

-31.521 

-31.521 


Gorporate 



- 

- 

- 

- 


Private 

- 

- 

0.563 

0.563 

- 

- 


Total 

- 

- 

- 

- 




T 

0.69 

0.69 

1.62 

1.62 

2 

2 


A 

0.001 

0.001 

0.005 

0.005 

0.002 

0.002 


d 

- 

0.1 

- 

0.1 

- 

0.1 


Sample 

1981 - 

2004 

1981 - 

2004 

1981 - 

2004 


FE 

/ 


/ 


/ 



Table 7: Growth regressions with corporate, private, or total debt (see header) included both as 
threshold variable and as regressor. Estimated parameters, pooled data, Lasso (L) and Thresholded 
Lasso (T). Empty cells are parameters set to zero, dashes indicate parameters not included in the 
model. 
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A large fraction of /3 is non zero, the Lasso drops a single variable twice, while 6 is more sparse, 
the Lasso drops between 2 and 7 variables. The thresholding parameter C is always chosen among 
the lowest values in the search grid, this nonetheless results in between 1 and 3 extra parameters 
being discarded compared to the scaled Lasso. A threshold (r) for the effect of gov ernment debt on 


growt h is found at between 6 0% and 80% of GDP, con si stent with the find ings of ICecchetti et al 
( 20121 1: Reinhart and Eogoff ( 20101 ): Caner et al. ( 2O10l ): Baum et al. ( 20131 ). 


The level of GDP is found to have a negative effect on GDP per capita growth as predicted 
by the income convergence hypothesis, as do inflation, the dependency ratio, population growth, 
and crises. Considering the effect of both /3 and 6, our model indicates in most instances that 
government debt has a positive effect below the threshold and a negative effect, or no effect at all, 
above the debt threshold. Ceteris paribus a 10 percentage point increase in the government debt 
to GDP ratio, when it is above the threshold, is found to result in a decrease of the average 5 
year growth rate between 0.07% and zero. Looking at this effect of high debt on future growth in 
isolation is overly restrictive though since there are large changes in the other parameters of the 
model when the debt threshold is crossed. This is in particular the case for financial variables. 
Interestingly, crises are found to have a more detrimental effect on growth for countries with a 
government debt ratio below the threshold and while liquid liabilities (LL) are beneficial to the 
future growth of a country with low debt this does not appear to be the case when debt is high. 

Table [3 reports estimates for 3 other measures of debt in a model with country dummies and 
using the full sample, the same model used in the first two columns of Table [6j The sparsity pattern 
in Table [3 is comparable to that of Table [6] and some similarities are found between the estimated 
values. Again, the level of per capita GDP is found to have a negative impact on future growth, as 
are the dependency ratio, inflation, population growth, and financial crisis. 

A threshold is always found and identified, 69% for corporate debt, 162% for private debt, and 
200% for the total debt. The large value of the estimated thresholds for private and total debt can 
be explained by the fact that these are aggregate measures of debt and hence of a substantially 
larger magnitude than either corporate of government debts. The effect of corporate and total debt 
is found to be positive and not directly affected by the threshold whereas the effect of private debt 
is negative, and more so when private debt is high. As previously, financial crises are found to have 
a stronger negative impact on countries with low debt, though crises are detrimental to growth 
irrespective of the level of debt. 


7 Conclusion 


In this paper we considered high-dimensional threshold r egressions and p rovided sup-norm oracle 


inequalities for the estimation error of the scaled Lasso of Lee et al. ( 2012I L These results are non¬ 


trivial as most research has focused on either ii or £2 oracle inequalities. The sup-norm bounds 
are shown to be crucial for exact variable selection by means of thresholding. To be precise, we 
can distinguish at a much finer scale between zero and non-zero coefficients than would have been 
possible if thresholding had been based on either £i or £2 oracle inequalities. 

We carry out simulations and show that the thresholded scaled Lasso performs well in model 
selection. Finally, we estimate a set of growth regression documenting the existence of a threshold 
in the amount of debt relative to GDP. Several parameters change when the threshold is crossed 
making the effect of high debt on future growth unclear. 

Future work includes investigating the effect of multiple thresholds. 
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APPENDIX 


The followin g result is needed in the proofs of Theorems [T] and EJ It is similar to Lemma 6 in 
Lee et ah ( 2012l i but allows for random regressors and non-gaussian error terms. 


Lemma 2. Let Assumption 1 he satisfied. Then, 

1 


-X\f)U 


n 


= o. 


log{m] 


n 


Proof. First, note that 


such that it suffices to bound the right 


^X'{f)U <sup^eT 

^oo 

hand side. Let e > 0 be arbitrary. By the independence of (Ai,..., Ui, ...,Un) and (Qi ,..., Qn) 
one has for j = 1,..., m, 


1 


VrGTin .^1 


> e 


(Ql,-,Qn)) =P(^ 


I 1 

max 

l<k<n\ Ti 


= P{ max 

\l<k<n\n^-^ 


>e\{Qi,...,Qn) 
n 

1 


i=l 

k 


> e 


i=l 


(7) 


almost surely, where the first equality used that conditional on (Qi,..., Qn), (1{Qi<t}) 1{Q„<t}) 

can only take n different values (and sorted {Aj, Ui, by (Qi,..., Qn) in ascending order). The 

seco nd equality used the indep endence (Ai,..., A„, C/i,..., [/„) and {Qi, ...,Qn)- Next, by Corollary 
4 in Monteomerv-Smith ( 1993l i there exists a universal constant c > 0 such that 


P ( max I- y Xp'^U > e ] < cP ( IV 


l<k<n\ n 


i=l 


di), 


i=l 


> 


en 


( 8 ) 


As x\^'^Ui is subexponential (the product of two subgaussian variables is subexponential) for all 


i = 1, ...,n and j = 1, ...,m. Corollary 5.17 in Vershvnin ( 2012l l yields 

P (l^xl^^Ui > ^) < 2exp (-d [(e/A)2 A (e/K)] 


2=1 


n 


(9) 


where d > 0 and K = K{c) > 0 are absolute constants. Therefore, choosing e = for some 

A > 1 yields 


p(|yApv, 


en 


i=l 


> —j < 2 exp ( — 


dA 


K‘^y K 


log(m) ^ /log(m) 


n 


n 


n < 2 exp — 


dA 


K‘^y K 


log(m) 


where the second estimate used that log(?n)/n —)• 0 such that is smaller than its square root 

for n sufficiently large. Hence, 


/ I 1 

VrsTl n ^ 


> e (Qi, ...,Qn) < 2cexp - 


dA 


K 

for all j = 1,..., m almost surely. Taking expectations over (Qi,..., Qn) yields 


log(m^ 


1 




> e ) < 2c exp ( — 


dA 


V K'^yK 


log(?n) 


( 10 ) 
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Therefore, combining ([9]) and (llOp . a union bound over 2m terms yields 

dA 


P 


(sup —X'(t)U > e ) < 2m(l + c) exp — 

VrGT n J \ 

■X'{t)U 


K'^y K 


log(m) 


= o. 


^ log^m) ^ using the definition 


Choosing A sufficiently large implies that sup^gj^ 
of e = A-^log(m)/n. □ 

Lemma 3. Let assumption 1 be satisfied. Then, sup^g-^ maxi<j<2m||-^^'^n'^)IL ~ Op{l) and 
mini<j<2m||x('^')(to)||„ is bounded away from zero wpal. 

Proof. Consider the first claim and note that sup^g^ ma.-xi<j<2m\\X^^\T)\\^ = maxi<j<m||X('?')(r)||^, 

(i) I 

As Xfi is uniformly subgaussia n in j = 1,..., m it also holds that E (Ai| ^ j is uniformly bounded 

(this follows by Lemma 2.2.1 in van der Vaart and Wellner ( 1996l l and the inequalities at the bot¬ 
tom of page 95 in that reference). Thus, by the triangle inequality and subadditivity of x y/x, 


\ 


n ^ 


U)^ 


< 


I — 


i=l 


\ 




2=1 


"(i)^_ 


+ \lEXf 


OT 


and hence it suffices to bound 


-Exi 


diY 


or, equivalently, ^ 


-of 


uniformly in j = 1, ...,n by a constant with probability tending to 1. As the Xf are uniformly 
subexponential ( as they are a prod uct of uniformly subgaussian random variables) in j = 1, ...,m 
Corollary 5 .17 in Vershvnin ( 20121 ) implies that for any e > 0 there exist constants c,K>0 (see 


VershvninI (|2012l l for the exact meaning of the constants) such that 


P 




0)^ _ _£;jf0)' 


2=1 


> < 2exp c [(e/AT)^ A (e/K)] n'j 


for all j = 1,..., m. Now, choosing e = KV K/c, the union bound yields that 


pi 


1 


max 
\l<j<m Tl I^ 


E(-^= 


2 = 1 


- 0) ^" 


> e I < 2me 


as log(m)/n —)• 0. Thus, K V Kjc is large enough to be the sought constant. 

Now turn to the second claim and observe mini<j<2m||^^'^nffi)||n, = 

Note that by Assumption 1, 


min E (Xp^ l{Qi<to}) = ^ ) to =: r > 0. 

l<3<m i-t, J/ l<j<m 

where the first equality used the independence of Xi and Qi as well as that Qi us uniformly dis- 




d < r/2 with probabili ty tending to on e. As l{Qi<to} subexponential it follows once more 
from Corollary 5.17 in Vershvnin ( 2012l l that for d = K A r/2 


tributed on [0,1]. Therefore, it suffices to show that maxi<j<m ^ |Er=l hQ.<to} - 


n^|E(V'T«. 


_ Jfi' vU) 1 

i<to} 


2=1 


> d j < 2exp i—c [{d/K)"^ A (d/AT)] n) < 2e k 
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for j = 1, Thus, by the union bound 


P 


1 


max -\y"(xy> 1 
^ i=l 


0)^1 _ ^ r 


> d] < 2me~iP 


which tends to zero as —>■ 0 by assumption 1. 

Proof of TheoremUl Note first that when 6q = 0, for any random variable V 

Yi = X% + Ui = X'/3o + + Uu 


□ 


since 


^'i^{Qi<V}^0 - 0 - 


In particular, this is true for V = f. Next, since a = {/3',5'y satisfies the Karush-Kuhn-Tucker 
conditions for a minimum, one has 

-^X'(r) (Y - X(f)a) + \D(f)z(f) = 0 

where ||^(t)||£^ < 1 and z(f)j = sign{aj) if oij / 0 and. This can be rewritten as 

-X'{f)X{f) {a - ao) = -X’{f)Ui - \D{f)z{f). 
n n 

which is equivalent to 

S(r) (d - ao) = (S(r) - -X'{t)X{t)) (d - oq) + -X'{f)U - \D{f)z{f). 

n n 

Next, 0(r) = exists for all r E T under Assumption 1 when S has full rank as argued in 

the discussion of Assumption 2. In fact , n = k(sj 3, T) > 0 with probability tending to one as is 
needed in order to invoke Theorem 2 of Lee et al. ( 2012l i below. It follows that S(f) is invertible 
with inverse 0 (t). Thus, 

d — ao = 0 (t) (Sff)- X'(f)X(T)) (a — ao) + Q(f)—X'{f)U — XQ{f)D(f)z(f). 

^ n ' n 

Now recall that for matrices A,B and a vec tor c of compatible dime nsions, one has ||Ai?c||£ < 

1 ,^ \\Bch^<\\A\\,^\\B\L\\c\\ (see, eg. iHorn and JohnsonI (j2013l i Chapter 5). Using this. 


|A-ao||,^ ^l|0(^)||^^ (S(f) --A'(r)A(f)) 


[a - ao) 


hi 




1 


< sup||0(t)||^ sup (S(r)- X'{t)X{t)) 

reT '^°°Tgr n 


T 


fa — aoj 


Id 


+ sup||0(t) 


tGT 


n 


X\f)U ^ +Asup||0(r)||^^ m.ax^||A(^)||^ 


rgr 


( 11 ) 


where we have also used || 2 ;(f)|L < 1. Next, note that sup.,-g2’||0('^)|L is bounded by assump¬ 


tion. Furthermore, by Lemma [21 


\X'{f)U = Op while maxi<j<„i||A(l)||^ = Op{l) 

^oo 
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by Lemma [3j Finally, it follows by the arguments on page A6 and the last inequality before 

while 


Appendix B in Ihee et al.l ( 20121 ) that sup^g^ ~ 

d — oq 


= Or, 


U — Opis 


log(m) 


by Theorem 2 in the same reference. Using this in (11111 yields, with 


A = O (Y^log(m)/n), 


I OL 11^ — Op 


log(m) ( \og{mn 


n 


n 


+ 2 I I — Orj 


log(m) 


n 


as s 


/ log(mn) Q 

V 


□ 


Proof of Theoreml^ First, since a = {(3', 6')' satisfies the Karush-Kuhn-Tucker conditions for a 
minimum, one has 

- X'(f) (Y — X(f)a) + XD(f)z(f) = 0 

n ' 

where ||. 2 (t)||£^ < 1 andz(r)j = sign{aj) if aj / 0. This can be rewritten as 

- X'{f) (A(To)ao ~ X{T)a) = —X'{f)U — XD{f)z{f) 

n n 

which is equivalent to 

^X'{f)X{f) (d - ao) - (X(to) - A(f)) ao = Ix'{t)U - XD{T)z{f). 

The above display can be rewritten as 

S(to) (d - ao) - ^X'{f) (X(to) - X{f)) uq = (S(ro) - ^X'{t)X{t)) (d - oq) + ^X'{f)U - XD{f)z{f). 

Next 0(ro) = S(ro)“^ exists under Assumption 1 by the discussion after Assumption 2 as S is 
assumed to exist. In fact, k = k(s, 5,5) > 0 where S = {|t — ro| < ryo} and rjQ = V K^VJX ^ 


i s sati sfied with probability tending to one as is needed in order to invoke Theorem 3 of iLee et al 


( 2012li below (it is even satished when 5 is replaced by T). Thus, one may rewrite the above display 


as 


d - ao = 0(ro)-A'(f) (A(to) - X{t)) oq + 0(ro) (S(ro) - -A'(f)X(f)) (d - ao) 
n n 

+ Q{To)-X'{f)U - XQ{To)D{T)z{f) 
n 


such that arguments similar to those leading to (|llll yield 

^X\f){X{To)-X{f))aQ 


+ ll®('^o)ILoo (^('^o) ---^'('r)-’^(T)) ^||d-ao 
1 


Id 




n 


( 12 ) 


®Here Ki = \YfC\C2 where Ci is the constants proven to exist in Lemma [ 3 ] in the appendix ensuring that 
sup,.gj. maxi<j<2m||N*'^l(r)||^ < C2 with arbitrarily large probability (more precisely, for any e > 0 there exists a C2 
such that suPwgy maxi<j<2m||N(^'’(T)||^ < Ci with probability at least 1 — e). 
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where we used that ||-s('r)||^ < 1- First, note that ||0('^■o)||£ is bounded by assumption. Next, 

denoting by Z(to) and Z(t) the last m columns of X(ro) and X(t}, respectively, one has 


-X'(T)(X(To)-X(T))ao = -X'(T)(Z(To)-Z(T))do 

^ £nr> ^ 


(13) 


By Theorem 3 in Lee et all ( 2012l l one has |f — ro| = Op such that the probability of 

A = ||t —tqI < I can be made arbitrarily large by choosing K > 0 sufficiently large. 


Thus, on A, 


n 


X'if) (Z(ro) - Z(r))) 


1 

< sup 

^oo ^ ' 


<KCis\j{5o)\ 


U) 

log(m) 






n 


by Assumptions 1 and 4. As we have assumed that s| J(5o)| log(m)^/^/YTi —>■ 0, we have in particular 
that 


n 


X'{t) (X(to) - X(r)) ao 


= o. 


log(m) 


n 


(14) 


Next, note that 

1 


n 


(E(to) --A'(f)A(f)) < (E(ro) --A'(ro)X(To)) + - (A'(ro)X(ro) - A'(f)X(r)) 


n 


n 


First, by the subgaussianity of the covariates and the error terms Corollary 5.14 in Vershvnii] ( 2012 ) 


and a union bound yield tha10 


similar to the ones leading to (I14p . one also has 


(S(to) - iA'(ro)X(ro)) 


= Or, 


1 1 " I 

-{X'{To)X{To)-X'{f)X{f)) < sup 


^ Next, by arguments 
log(m) 




n 


on A by Assumption 4. Therefore, as s log(m)^/^/-y/n —>■ 0 (implied by our assumption s| J((5o)| log(m)^/^/-y/n 
0 ), we conclude that 


(S(ro) - -X'(f)X(r)) 


n 


= o„ 


log(m) 


n 


(15) 


Furthermore, by Lemma [51 


= Op and IId - ao||^^ = Op by 


log(m) 


n 


Theorem 3 in Lee et aP ( 2012I L Finally, maxi<j<m||xb41|^ = Op{l) by Lemma [3] which in con¬ 
junction with (flljl and (fT5|) yields in (fT2]) 

||« —ao|L = Op 
where have again used that slog(m)^/^/YTi ^ 0. 

□ 

^Alternatively, the arguments on pages A4-A6 in iLee et all (l2012l l yield a uniform (in r) upper bound on 
||(E(r) — 4A'(r)A(r))I of the order Op which could also be used resulting in only slightly worse 

rates. 
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Proof of LemmaUl First, note that 


S(r) 




such that by the formula for the inverse of a partitioned matrix with 0 = S ^ 


0 (r) = S ^(r) 



1 

1 — r 




(16) 


Thus, it suffices to bound ||S . To this end, note that S = (1 — p)I + pit' where t is a m x 1 

vector of ones. Thus, by the Sherman-Morrison-Woodbury formula, exists and equals 


0 = S-i 



p + pm) 


which implies that (using p/{l — p + pm) < 1) 


l|0|| 


1 / _ p ^ p{m-l) \ 

1 — 1 — p + pm 1 — p + pm J 


1 /l — 3p + 2mp\ ^ 2 

1 — \ 1 — p + mp J ~ 1 — p 


(17) 


Thus, combining (jl6jl and (I17h yields the first claim of the lemma. The second claim follows trivially 
from the first. □ 


Proof of Theorem 0 We consider the zero and non-zero coefficients separately and show that both 
groups will be classified correctly. Note that by Theorems [T] and [2] for every e > 0 there exists a 
C > 0 such that ||q; — q:|| ^CAona set P with probability at least 1 — e. The following arguments 
all take place on this set. Consider the truly zero coefficients first. To this end, let j G J{otQ)^ and 
note that 


max Ido I < C\ < 2C\ = H 
jeJ(aop 

such that d = 0 by the definition of the thresholded scaled Lasso. 

Next, consider the non-zero coefficients. To this end, let j G J{cto) and note that 

|dj| > min \aj\ — \aj — ajo\ > 3C\ — CX = 2C'A = H 
j&J{ao) 

such that |dl = |d| 7 ^ 0 by the definition of the thresholded scaled Lasso and the assumption that 
minjgj(„g) \aj\ > 3CX □ 

Proof of Theorem Proceeds exactly as the proof of Theorem [3l □ 
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